Analysis of constant-Q filterbank based representations for speech emotion recognition
نویسندگان
چکیده
This work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER). Constant-Q filterbank provides non-linear spectro-temporal representation with higher frequency resolution at low frequencies. Our investigation reveals how increased low-frequency benefits SER. The time-domain comparative analysis between short-term mel-frequency spectral coefficients (MFSCs) and features, namely transform (CQT) continuous wavelet (CWT), that provide time-invariance low-frequencies. robustness against irrelevant temporal variations in pitch, especially low-arousal emotions. corresponding frequency-domain over different classes shows better of pitch harmonics constant-Q-based than MFSC. These advantages are further consolidated by SER performance extensive evaluation features four publicly available databases six advanced deep neural network architectures as back-end classifiers. inferences this study hint toward suitability potentiality
منابع مشابه
Speech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملLearning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition
As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of di...
متن کاملReusing Neural Speech Representations for Auditory Emotion Recognition
Acoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models. The difficulties come from the scarcity of training data, general subjectivity in emotion perception resulting in low annotator agreement, and the uncertainty about which features are the most relevant and robust ones for classification. In this paper, we...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملa gender-based pragmatic analysis of the use of english compliment responses by iraqi efl students:a speech act perspective
تعارفات کنش های گفتاری هستند که افراد در زندگی روزمر? خود به منظور برقراری دوستی یا تداوم روابط مسالمت آمیز به کار می برند. ساز و کار تعارف مختص زبان انگلیسی یا هر زبان دیگری نیست و پدیده ای است جهانی و در همه زبانها حضور دارد. تفاوتی که از این نظر در زبانها و فرهنگ ها وجود دارد مربوط به پاسخ به این کنش گفتاری در گفتمان است. این مطالعه به بررسی تنوع پاسخ های انگلیسی و عربی به کنش گفتاری تعارف د...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Digital Signal Processing
سال: 2022
ISSN: ['1051-2004', '1095-4333']
DOI: https://doi.org/10.1016/j.dsp.2022.103712